KSTEST
Overview
The KSTEST function performs the one-sample Kolmogorov-Smirnov (K-S) test, a nonparametric goodness-of-fit test that determines whether a sample comes from a specified theoretical probability distribution. Named after mathematicians Andrey Kolmogorov and Nikolai Smirnov, who developed it in the 1930s, the test is widely used in statistics to validate distributional assumptions.
The K-S test works by comparing the empirical distribution function (EDF) of the sample data against the cumulative distribution function (CDF) of the reference distribution. The test statistic D_n is defined as the supremum (maximum) of the absolute differences between these two functions:
D_n = \sup_x |F_n(x) - F(x)|
where F_n(x) is the empirical distribution function of the sample and F(x) is the CDF of the theoretical distribution being tested. Intuitively, the statistic captures the largest vertical distance between the sample’s step function and the hypothesized smooth distribution curve.
This implementation uses the scipy.stats.kstest function from the SciPy library. The function supports testing against various distributions including normal, uniform, exponential, gamma, and others available in scipy.stats. Three alternative hypotheses are available: two-sided (distributions are not identical), less (the sample CDF is below the theoretical CDF), and greater (the sample CDF is above the theoretical CDF).
The function returns both the K-S statistic and a p-value. A small p-value (typically < 0.05) indicates that the sample likely does not come from the specified distribution. For p-value computation, the function offers several methods: exact uses the exact distribution, asymp uses the asymptotic Kolmogorov distribution, and auto selects the most appropriate method based on sample size.
The K-S test is most effective for continuous distributions and requires no assumptions about the underlying data beyond continuity. However, it tends to be less powerful than specialized tests like the Shapiro-Wilk test or Anderson-Darling test when testing for specific distributions such as normality. For more information, see the Kolmogorov-Smirnov test Wikipedia article.
This example function is provided as-is without any representation of accuracy.
Excel Usage
=KSTEST(rvs, kstest_cdf, kstest_args, kstest_alternative, kstest_method)
rvs(list[list], required): Sample data to test against the theoretical distributionkstest_cdf(str, optional, default: “norm”): Name of theoretical distribution to test againstkstest_args(list[list], optional, default: null): Parameters for the theoretical distribution (e.g., mean and std for normal)kstest_alternative(str, optional, default: “two-sided”): Defines the null and alternative hypotheseskstest_method(str, optional, default: “auto”): Method for calculating the p-value
Returns (list[list]): 2D list [[statistic, p_value]], or error message string.
Examples
Example 1: Test sample against standard normal distribution
Inputs:
| rvs | kstest_cdf | |
|---|---|---|
| 0.1 | -0.5 | norm |
| 0.3 | -0.2 | |
| 0.8 | 0 |
Excel formula:
=KSTEST({0.1,-0.5;0.3,-0.2;0.8,0}, "norm")
Expected output:
| Result | |
|---|---|
| 0.3085 | 0.5191 |
Example 2: Test sample against uniform distribution
Inputs:
| rvs | kstest_cdf | |
|---|---|---|
| 0.1 | 0.2 | uniform |
| 0.3 | 0.4 | |
| 0.5 | 0.6 |
Excel formula:
=KSTEST({0.1,0.2;0.3,0.4;0.5,0.6}, "uniform")
Expected output:
| Result | |
|---|---|
| 0.4 | 0.224 |
Example 3: Test normal distribution with custom mean and std
Inputs:
| rvs | kstest_cdf | kstest_args | ||
|---|---|---|---|---|
| 5 | 5.2 | norm | 5 | 0.5 |
| 4.8 | 5.1 |
Excel formula:
=KSTEST({5,5.2;4.8,5.1}, "norm", {5,0.5})
Expected output:
| Result | |
|---|---|
| 0.3446 | 0.6237 |
Example 4: One-sided test with greater alternative
Inputs:
| rvs | kstest_cdf | kstest_alternative | |
|---|---|---|---|
| 0.1 | 0.2 | uniform | greater |
| 0.3 | 0.4 |
Excel formula:
=KSTEST({0.1,0.2;0.3,0.4}, "uniform", "greater")
Expected output:
| Result | |
|---|---|
| 0.6 | 0.0337 |
Example 5: Test exponential distribution with all parameters
Inputs:
| rvs | kstest_cdf | kstest_args | kstest_alternative | kstest_method | ||
|---|---|---|---|---|---|---|
| 1 | 2 | expon | 0 | 2 | two-sided | asymp |
| 3 | 4 |
Excel formula:
=KSTEST({1,2;3,4}, "expon", {0,2}, "two-sided", "asymp")
Expected output:
| Result | |
|---|---|
| 0.3935 | 0.5655 |
Python Code
from scipy import stats
import math
def kstest(rvs, kstest_cdf='norm', kstest_args=None, kstest_alternative='two-sided', kstest_method='auto'):
"""
Performs the one-sample Kolmogorov-Smirnov test for goodness of fit.
See: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.kstest.html
This example function is provided as-is without any representation of accuracy.
Args:
rvs (list[list]): Sample data to test against the theoretical distribution
kstest_cdf (str, optional): Name of theoretical distribution to test against Valid options: Normal, Uniform, Exponential, Log-Normal, Beta, Gamma, Chi-Square, t, F, Weibull. Default is 'norm'.
kstest_args (list[list], optional): Parameters for the theoretical distribution (e.g., mean and std for normal) Default is None.
kstest_alternative (str, optional): Defines the null and alternative hypotheses Valid options: Two-sided, Less, Greater. Default is 'two-sided'.
kstest_method (str, optional): Method for calculating the p-value Valid options: Auto, Exact, Approx, Asymp. Default is 'auto'.
Returns:
list[list]: 2D list [[statistic, p_value]], or error message string.
"""
def to2d(x):
return [[x]] if not isinstance(x, list) else x
# Normalize rvs to 2D list
rvs = to2d(rvs)
# Validate rvs is a 2D list with at least one row
if not all(isinstance(row, list) for row in rvs) or len(rvs) < 1:
return "Invalid input: rvs must be a 2D list with at least one row."
# Flatten rvs to 1D sample data
try:
sample_data = [float(item) for row in rvs for item in row]
except (TypeError, ValueError):
return "Invalid input: rvs must contain only numeric values."
if len(sample_data) < 2:
return "Invalid input: sample must contain at least two values."
# Validate kstest_cdf is a string
if not isinstance(kstest_cdf, str):
return "Invalid input: kstest_cdf must be a string naming a distribution."
# Parse distribution parameters if provided
dist_args = ()
if kstest_args is not None:
kstest_args = to2d(kstest_args)
if not all(isinstance(row, list) for row in kstest_args):
return "Invalid input: kstest_args must be a 2D list or None."
try:
dist_args = tuple(float(item) for row in kstest_args for item in row)
except (TypeError, ValueError):
return "Invalid input: kstest_args must contain only numeric values."
# Validate kstest_alternative
valid_alternatives = ('two-sided', 'less', 'greater')
if kstest_alternative not in valid_alternatives:
return f"Invalid input: kstest_alternative must be one of {valid_alternatives}."
# Validate kstest_method
valid_methods = ('auto', 'exact', 'approx', 'asymp')
if kstest_method not in valid_methods:
return f"Invalid input: kstest_method must be one of {valid_methods}."
# Get the distribution from scipy.stats
try:
distribution = getattr(stats, kstest_cdf)
except AttributeError:
return f"Invalid input: '{kstest_cdf}' is not a recognized distribution in scipy.stats."
# Call scipy.stats.kstest
try:
result = stats.kstest(sample_data, distribution.cdf, args=dist_args, alternative=kstest_alternative, method=kstest_method)
stat = float(result.statistic)
pvalue = float(result.pvalue)
except Exception as e:
return f"Error in scipy.stats.kstest: {e}"
# Check for nan/inf
if math.isnan(stat) or math.isnan(pvalue) or math.isinf(stat) or math.isinf(pvalue):
return "Invalid result: output contains nan or inf."
return [[stat, pvalue]]